9 research outputs found
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Model Tuning or Prompt Tuning? A Study of Large Language Models for Clinical Concept and Relation Extraction
Objective To develop soft prompt-based learning algorithms for large language
models (LLMs), examine the shape of prompts, prompt-tuning using
frozen/unfrozen LLMs, transfer learning, and few-shot learning abilities.
Methods We developed a soft prompt-based LLM model and compared 4 training
strategies including (1) fine-tuning without prompts; (2) hard-prompt with
unfrozen LLMs; (3) soft-prompt with unfrozen LLMs; and (4) soft-prompt with
frozen LLMs. We evaluated 7 pretrained LLMs using the 4 training strategies for
clinical concept and relation extraction on two benchmark datasets. We
evaluated the transfer learning ability of the prompt-based learning algorithms
in a cross-institution setting. We also assessed the few-shot learning ability.
Results and Conclusion When LLMs are unfrozen, GatorTron-3.9B with soft
prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept
extraction, outperforming the traditional fine-tuning and hard prompt-based
models by 0.6~3.1% and 1.2~2.9%, respectively; GatorTron-345M with soft
prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end
relation extraction, outperforming the other two models by 0.2~2% and
0.6~11.7%, respectively. When LLMs are frozen, small (i.e., 345 million
parameters) LLMs have a big gap to be competitive with unfrozen models; scaling
LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen
LLMs. For cross-institute evaluation, soft prompting with a frozen
GatorTron-8.9B model achieved the best performance. This study demonstrates
that (1) machines can learn soft prompts better than humans, (2) frozen LLMs
have better few-shot learning ability and transfer learning ability to
facilitate muti-institution applications, and (3) frozen LLMs require large
models
On the Impact of Cross-Domain Data on German Language Models
Traditionally, large language models have been either trained on general web
crawls or domain-specific data. However, recent successes of generative large
language models, have shed light on the benefits of cross-domain datasets. To
examine the significance of prioritizing data diversity over quality, we
present a German dataset comprising texts from five domains, along with another
dataset aimed at containing high-quality data. Through training a series of
models ranging between 122M and 750M parameters on both datasets, we conduct a
comprehensive benchmark on multiple downstream tasks. Our findings demonstrate
that the models trained on the cross-domain dataset outperform those trained on
quality data alone, leading to improvements up to over the previous
state-of-the-art. The models are available at
https://huggingface.co/ikim-uk-essenComment: 13 pages, 1 figure, accepted at Findings of the Association for
Computational Linguistics: EMNLP 202
A Study of Generative Large Language Model for Medical Research and Healthcare
There is enormous enthusiasm and concerns in using large language models
(LLMs) in healthcare, yet current assumptions are all based on general-purpose
LLMs such as ChatGPT. This study develops a clinical generative LLM,
GatorTronGPT, using 277 billion words of mixed clinical and English text with a
GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical
natural language processing for medical research. Synthetic NLP models trained
using GatorTronGPT generated text outperform NLP models trained using
real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best)
scale shows that there is no significant difference in linguistic readability
(p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical
relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that
physicians cannot differentiate them (p < 0.001). This study provides insights
on the opportunities and challenges of LLMs for medical research and
healthcare
Recommended from our members
Bear: A Framework for Understanding Application Sensitivity to OS (Mis) Behavior
The Application of Tomato Plant Residue Compost and Plant Growth-Promoting Rhizobacteria Improves Soil Quality and Enhances the Ginger Field Soil Bacterial Community
Treating and utilizing vegetable residues may reduce waste and improve rhizosphere soil. This study explored the effects of tomato plant residue compost and plant growth-promoting rhizobacteria (PGPR) on the physicochemical properties and microbial community of ginger field soil. Four treatment procedures were adopted: no compost or PGPR (CK), compost (TC), compost + Bacillus subtilis (TC-BS), and compost +Bacillus amyloliquefaciens SQR9 (TC-BA). The results showed that compared with the CK, TC significantly increased soil organic matter, alkali hydrolyzable nitrogen, available phosphorus, and available potassium by 17.34%, 21.66%, 19.56%, and 37.20%, respectively. Soil urease activity, neutral phosphatase activity, and sucrase activity increased by 55.89%, 35.59%, and 57.21%, respectively. Chloroflexi, Gemmatimonadetes, and Bacillus abundances increased by 1.40%, 1.80%, and 0.68%, respectively, while Firmicutes decreased by 0.80%. TC-BS significantly improved soil bacterial diversity than CK and TC, and relative abundance of Beneficial Proteobacteria, Acidobacteria, Chloroflexi, and Bacillus microorganisms dominated. Principal coordinate analysis revealed significant differences in bacterial community structure among different treatments. Redundancy analysis indicated total potassium (p = 0.002), pH (p = 0.0012), and available phosphorus (p = 0.016) as the main community composition driving factors. In conclusion, B. subtilis inoculation in ginger field soil supplemented with tomato compost enhanced bacterial diversity, altered bacterial community structure, enriched beneficial microorganisms, and promoted a healthy rhizosphere
A large language model for electronic health records
Abstract There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og